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fi ■ Abstract 

We propose an unconstrained stoehastic approximation method of finding the optimal mea- 
00 ' sure change (in an a priori parametric family) for Monte Carlo simulations. We consider different 

parametric families based on the Girsanov theorem and the Esscher transform (or exponential- 
tilting). In a multidimensional Gaussian framework, Arouna uses a projected Robbins-Monro 
^ I procedure to select the parameter minimizing the variance (see [2]). In our approach, the param- 

eter (scalar or process) is selected by a classical Robbins-Monro procedure without projection 
or truncation. To obtain this unconstrained algorithm we intensively use the regularity of the 
C^ . density of the law without assume smoothness of the payoff. We prove the convergence for a 

large class of multidimensional distributions and diffusion processes. 

We illustrate the effectiveness of our algorithm via pricing a Basket payoff under a multidi- 
^_ mensional NIG distribution, and pricing a barrier options in different markets. 

> 
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'^ '. 1 Introduction 

00 

O ■ 

The basic problem in Numerical Probability is to optimize some way or another the computation 
by a Monte Carlo simulation of a real quantity m known by a probabilistic representation 

m = E[F{X)] 

where X : ($7,^, P) -^ (E,\.\e) is a random vector having values in a Banach space E and 
F : E ^M. is a Borel function (and F{X) is square integrable). The space E is M but can also be 
a functional space of paths of a process X = (-^i)i(:ro,Tl- However, in this introduction section, we 
will first focus on the finite dimensional case E = M.'^. 

Assume that X has an absolutely continuous distribution ¥^{dx) = p{x)\d{dx) (A^ denotes 
the Lebesgue measure on (M'^,i3or(R'^))) and that Fg L2(P^.) with P(F(V) / 0) > (otherwise 
the expectation is clearly and the problem is meaningless). Furthermore we assume that the 
probability density p is everywhere positive on M*^. 

The paradigm of importance sampling applied to a parametrized family of distributions is the 
following: consider the family of absolutely continuous probability distributions ■7Tg{dx) := pQ{x)dx, 
£ Q, such that pe{x) > 0, Xd{dx)-a.e. . One may assume without loss of generality that is 
an open non empty connected subset of M'^ containing so that po = p. In fact we will assume 
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throughout the paper that G 
TTg, we have 



'. Then for any R -valued random variable X^ ' with distribution 



E[F{X)] = E 



F{X 



PeixW) 



(1.1) 



Among all these random variables having the same expectation m = K[F{X)], the one with 
the lowest variance is the one with the lowest quadratic norm: minimizing the variance amounts 
to finding the parameter 9* solution (if any) to the following minimization problem 

nimV{9) 



where, for every 9£ R'^, 

V{9) :=E 



F2(xW)^ 



•{xW) 

\{XW) 



E 



F^{X) 



Pe{X) 



< +C30. 



(1.2) 



A typical situation is importance sampling by mean translation in a finite dimensional Gaussian 
framework i.e. 



X(^) =X + 9, p{x) 



e 2 
(27r)2 



pg{x)=p{x-e) and V{9) = e^\^\'E\F'^{X)e 



~2{e,x) 



Then the second equality in (jl.2p is simply the Cameron-Martin formula. This specific framework is 
very important for applications, especially in Finance, and was the starting point of the new interest 
for recursive importance sampling procedures, mainly initiated by Arouna in [2] (see further on). 
In fact, as long as variance reduction is concerned, one can consider a more general framework 
without extra effort. As a matter of fact, if the distributions pg satisfy 



{1} VxgII 
[ii) VxG II 

and F satisfies E 



Q "-^ Pe{x) is log-concave 
\\m.\e\~*^ooPe{x) = Q, or 



Vx G R'^, limi. 



Pe(x) 



0, 



(Wi 



F2(X) 



PJX) 
Po{X) 



< -|-cx) for every 9 G R"^, then (see Proposition [T] below), the 

function V is finite, convex, goes to infinity at infinity. As a consequence ArgminT/ = {W = 0} 
is non empty. Assumption (ii) can be localized by by considering that one the two conditions 
holds only on a Borel set C of R'^ such that P^(C n {F / 0}) > 0. If 61 h^ pg{x) is strictly 
log-concave for every x in a Borel set B such that Px [B D {F ^ 0}] > 0, then V is strictly convex 
and ArgminT/ = {V^ = 0} is reduced to a single 9* £ R"^. These results follow from the second 
representation of V as an expectation in ()1.2p which is obtained by a second change of probability 
(the reverse one). For notational convenience we will temporarily assume that Argminl/ = {9*} 
in this introduction section, although our main result needs no such restriction. 

A classical procedure to approximate 9* is the so-called Robbins-Monro algorithm. This is a 
recursive stochastic algorithm (see ( AlgoRMJ ) below) which can be seen as a stochastic counterpart 
of deterministic recursive zero search procedures like the Newton-Raphson one. It can be formally 
implemented provided the gradient of the (convex) target function V admits a representation as 
an expectation. Since we have no a priori knowledge about the regularity of F (p) and do not wish 



When F is smooth enough alternative approaches have been developed based on some large deviation estimates 
which provide a good approximation of 6* by deterministic optimization methods (see [10]). 



to have any, we are naturally lead to formally differentiate the second representation of V in (jl.2 
to obtain a representation of W as 



VF(^) = E 



^ >Pe{X) pe{X) 



(1.3) 



Then, if we consider the function Hyi6,x) such that W{9) = E (^Hv{0,X)^ naturally defined 
by (jl.3p . the derived Robbins- Monro procedure writes 

On+l = On - Jn+lHvien,Xn+l), (AlgoRM) 

with (7n)n>o a step sequence decreasing to (at an appropriate rate), {Xn)n>o a sequence of i.i.d. 
random variables with distribution p(x)Xd{dx). To establish the convergence of a Robbins-Monro 
procedure to 0* = Argminy requires seemingly not so stringent assumptions. We mean by that: 
not so different from those needed in a deterministic framework. However, one of them turns out 
to be quite restrictive for our purpose: the sub-linear growth assumption in quadratic mean 

yeeR'^, \\Hv{e,x)\\ <c{i + \e\). (nec) 



which is the stochastic counterpart of the classical non-explosion condition needed in a determin- 
istic framework. In practice, this condition is almost never satisfied in our framework due to the 
behaviour of the term ^^^l as 6 goes to infinity. 

The origin of recursive importance sampling as briefly described above goes back to Kushner 
and has recently been brought back to light in a Gaussian framework by Arouna in [2]. However, 
as confirmed by the numerical experiments carried out by several authors ( [21 [T2| [T3] ) , the regular 



Robbins-Monro procedure (AlgoRM) does suffer from a structural instability coming from the 
violation of ()NEC|) . This phenomenon is quite similar to the behaviour of the explicit discretization 
schemes of an ODE = x = h{x) when h has a super-linear growth at infinity. Furthermore, in 
a probabilistic framework no "implicit scheme" can be devised in general. Then the only way 
out mutatis mutandis is to kill the procedure when it comes close to explosion and to restart it 
with a smaller step sequence. Formally, this can be described as some repeated projections or 
truncations when the algorithm leaves a slowly growing compact set waiting for stabilization which 
is shown to occur a.s.. Then, the algorithm behaves like a regular Robbins-Monro procedure. This 
is the so-called "Projection a la Chen" avatar of the Robbins-Monro algorithm, introduced by Chen 
in [Bl [7] and then investigated by several authors (see e.g. [llIlS) Formally, repeated projections 
"a la Chen" can be written as follows: 



7n+l 



Uk^ {On - Hv{en,Xn+l)} (AlgoP) 



where IIk^,„-^ denotes the projection on the convex compact Ko.(„) {Kp is increasing to M"^ as 
p -^ do). In [14] is established a a Central Limit Theorem for this version of the recursive vari- 
ance reduction procedure. Some extensions to non Gaussian framework have been carried out by 
Arouna in his PhD thesis (with some applications to reliability) and more recently to the marginal 
distributions of a Levy processes by Kawai in [12j . 

However, convergence occurs for this procedure after a long "stabilization phase" . . . provided 
that the sequence of compact sets have been specified in an appropriate way. This specification 
turns out to be a rather sensitive phase of the "tuning" of the algorithm to be combined with that 
of the step sequence. 

In this paper, we show that as soon as the growth of F at infinity can be explicitly controlled, it 
is always possible to design a regular Robbins-Monro algorithm which a.s. converges to a variance 
minimizer 9* with no risk of explosion (and subsequently no need of repeated projections). 



To this end the key is to introduce a third change of probabihty in order to control the term 
^^?'. . In a Gaussian framework this amounts to switching the parameter 9 from the density p to 

the function F by a third mean translation. This of course corresponds to a new function Hy but 
can also be interpreted a posteriori as a way to introduce an adaptive step sequence (in the spirit 

of m)- 

In terms of formal importance sampling, we introduce a new positive density qq (everywhere 
positive on {p > 0}) so that the gradient writes 



VV{0) = E 



F\xW) 



pgiXW)qe{XW) peiXW) 



p^iX^'^) Vpe{X^'^) 



E 



Hv{9,X^'^) 



(1.4) 



may seem complicated but the 



wYieie X'^^'i r^ q0{x)dx. The "weight" 

Pe{XW)qe{X^'^) Ve{XW) 

role of the density qg is to control the critical term — fx) (x) ^^ ^ (deterministic) quantity only 
depending on 0. Then we can replace Hy by a function H{9,x) = 6{6) Hv{6,x) in the above 



Robbins-Monro procedure ( AlgoRMJ ) where (5 is a positive function used to control the behaviour 
of Hv{0,x) for large values of x (note that {E[F(.,X(^))] = O) = {VV = 0}). 

We will first illustrate this paradigm in a finite dimensional setting with parametrized impor- 
tance sampling procedures: the mean translation and the Esscher transform which coincide for 
Gaussian vectors on which a special emphasis will be put. Both cases correspond to a specific 
choice of qg which significantly simplifies the expression of the weight. 

As a second step, we will deal with an infinite dimensional setting (path-dependent diffusion 
like processes) where we will rely on the Girsanov transform to play the role of mean translator. To 
be more precise, we want now to compute E[i<'(X)] where X is a path-dependent diffusion process 
and F is a functional defined on the space C([0,T],M"') of continuous functions defined on [0, T]. 
We consider a d-dimensional Ito process X = {Xt)t^\n^T] solution of the path-dependent SDE 



dXt = 6(i,X*)dt + cj(i,X*)dVFi, XQ = xe 



{Eb,cr, 



Wj 



where W = {Wt)te[o,T] is a g-dimensional standard Brownian motion, X* := {Xt^s)se[o,T] is the 
stopped process at time t, b : [0,T] x C([0,r],M'^) -^ R'^ and a : [0,T] x C([0,T],M'^) -^ M{d,q) 
are Lipschitz with respect to the || . ||^ on the space C([0, T], M'^) and continuous in (t, x) G [0, T] x 
C{[0,T],R'^) (see [IB] for more details about these path-dependent SDK's). 

Let if he a fixed borel bounded functional on C([0, T],]R ) with values in M{q,p) (where p > 
1 is a free integral parameter). Then a Girsanov transform yields that for every 6 G L'^p '■= 
L2([0,r],MP), 



E[F{X)] = E 



F(xW)e 



-/,fMxW.»)e(s),diy,>- 



^(x(»)^-)e||'2 



where X^^' is the solution to (-Efe+cri/^e.cr)- The functional to be minimized is now 

-2j^Mx«»'-)e{s),dWs)-y{x(oh-)e\\l2 



v{e) = E 



F(XW)2 



T,q 



eL 



T,P- 



In practice we will only minimize V over a finite dimensional subspace of E = spanjei, . . . , em} C 



7^2 

T,p- 



The paper is organized as follows. Section[2]is devoted to the finite dimensional setting where we 
recall the main tool including a slight extension of the Robbins-Monro theorem in the Subsection 12. II 
and the gaussian case investigated in [2] is revisited to emphasize the new aspects of our algorithm 
in the Subsection 12.21 

In Section 2 we successively investigate the translation for log-concave distributions probability 
and the Esscher transform. In Section [3] we introduce a functional version of our algorithm based 
on the Girsanov theorem to deal the SDE. In Section 4 we provide some comments on the practical 
implementation and in Section [5] some numerical experiments are carried out on some option pricing 
problems. 

Notations: • We will denote by 5 > the fact that a symmetric matrix S is positive definite. | . | 
will denote the canonical Euclidean norm on M™ and ( ., . ) will denote the canonical inner product. 

• The real constant C > denotes a positive real constant that may vary from line to line. 

• 11/11^2^ := (f^ flit) + ■■■ + f^{t)dt] ' if / = (/i, . . . , /p) is an M^-valued (class of) Borel func- 
tion(s). 



2 The finite-dimensional setting 

2.1 Argmin V as a target 

Proposition 1 Suppose l\TCi\j holds. 

Then the function V defined by /il.2\) is convex and lim|g|^+oo ^(^) = +oo. As a consequence 



Argminy = {W = 0} / 0. 



dna 



P{X) 



. Let X fixed in W'. The 



Proof. By the change of probability ^ we have V{e) = E F^(X)^^(^ 

function (9 h^ logpg{x)) is concave, hence log{l/pg{x)) = —logpg{x) is convex so that, owing to 
the Young Inequality, the function — l^ is convex since it is non-negative. 

To prove that V tends to infinity as \9\ goes to infinity, we consider two cases: 

- If lim pe{x) = for every x G M , the result is trivial by Fatou's Lemma. 

\e\-*+oo 



If lim 



pe{x) 



\e\^+coplj^{x) 



for every x G M"^, we apply the reverse Holder inequality with conjugate 



exponents {\,—\) to obtain 



v{e) > E 



>E 



F2/3(X) 
F2/3(X) 



v{X)pe{X) 
Pe/2iX) \i 



E 



Pe/2{X)J 



^p{X)pg{X)J 

{p and pg are probability density functions). One concludes again by Fatou's Lemma. ^ 



The set ArgminT/, or to be precise, the random vectors taking values in Argminy will the 
target (s) of our new algorithm. If V is strictly convex, e.g. if 

P[X€ {p.{^) strictly log-concave and F{x) ^ 0}] > 0, 

then ArgminT/ = {6*}. Nevertheless this will not be necessary owing to the combination of the 
two results that follow. 



Lemma 1 Let U : M — > M+ be a convex differentiable function, then 

ye,9'eR'^, {vu{e)-vu{9'),e-9')>o. 

Furthermore, if ArgmmU is nonempty, it is a convex closed set (which coincide with {VU = 0}) 
and 

\/ 9 £R'^\ ArgmmU, \/ 9* € ArgminC/, {VU{9),9-9*) > 0. 

A sufficient (but in no case necessary) condition for a nonnegative convex function U to attain 
a minimum is that limuj^o^ U{x) = +oo. 

Now we pass to the statement of the convergence theorem on which we wih rely throughout the 
paper. It is a shght variant of the regular Robbins-Monro procedure whose proof is rejected in an 
annex. 

Theorem 1 (Extended Robbins-Monro Theorem) Let H : MS x W^ — > M'* a Borel function and X 
an M. -valued random vector such that K[\H{9,X)\] < +oo for every 0G M . Then set 

V^gM'^, h{9) = E[H{9,X)]. 

Suppose that the function h is continuous and that T* := {h = 0} satisfies 

y9eM.'^\T*,y9*eT*, {9-9*,h{9)) >o. (2.5) 

Let 7 = (7n)n>i be a sequence of gain parameters satisfying 

^7n = +oo and ^7^<+oo. (2.6) 

n>l n>l 

Suppose that 

V^gM'^, E[\H{9,X)\^] <C{1 + \9\^) (NEC) 

(which implies \h{9)\^ < C(l + \9\^)). 

Let {Xn)n>i be an i.i.d. sequence of random vectors having the distribution of X, a random 
vector 9q, independent of {Xn)n>i satisfying E[|0oP] < +00, all defined on the same probability 
space {Vt,A,W). Then, the recursive procedure defined by 

9n = 9n-l - Jn+lH{9n,Xn+l), U > 1 (2.7) 

satisfies: 

39^:{Vt,A)^T*,9^^L^{¥), such that 9n^^9^. (2.8) 

The convergence also holds in LP(P), pG (0,2). 

The proof is postponed to the Appendix at the end of the paper. The natural way to apply 
this theorem for our purpose is the following: 

- Step 1: we will show that the convex function V in (jl.2p is differentiable with a gradient 
'SJV having a representation as an expectation formally given W{9) = E'^qv{9,X)\. 

- Step 2: then set H{9,x) := p{9)'Vov{9,x) where /? is a (strictly) positive function on M.''. As 
a matter of fact, with the notations of the above theorem 

{9 - 9*,h{9)) = 6{9){9 - 9*,VV{9)), 

so that T* = Argminy and (j2.5p is satisfied (set U := V in Lemma [T|). 

- Step 3: Specify in an appropriate way the function 5 so that the linear quadratic growth 
assumption is satisfied. This is the sensitive point that will lead us to modify the structure 
more deeply by finding a new representation of W as an expectation not directly based on 
the local gradient Vgv{9,x). 



2.2 A first illustration: the Gaussian case revisited 

The Gaussian is the framework of [5] . It is also a kind of introduction to the infinite dimensional 
diffusion setting investigated in Section [3l In the Gaussian case, the natural importance sampling 
density is the translation of the gaussian density: pe{x) = p{x — 9) for 6 E W^ {i.e. q = d). We have 

pe{x) = e 2-+<^>'^)p(a;). 



The assumption (?^) is clearly satisfied by the Gaussian density, and we assume that F satisfies 
E[F2(X)e-<'''^>] < +CX) so that V is well defined. 

In [2], Arouna considers the function Hv{G,x) defined by 



h2 



It is clear that the condition (jNEGp is not satisfied even if we simplify this function by e'^' '^ (which 
does not modify the problem). 

A first approach: When F{X) have finite moments of any order, a naive way to control directly 
||^y(^„,X„_|_i)||2 by an explicit deterministic function of 9 (in order to rescale it) is to proceed 
as follows: one derives from Holder Inequality that for every couple (r, s), r, s > 1 of conjugate 
exponents 

\\Hv{9,X)\\<\\{9-X)F'{X)\ 



,.,|2 



^^{B,X) 



Setting r = 1 + i and s = 1 + e, yields 

||^V'(6',X)|| <{\\XF'^(X)\\ 1 +\\f'^(X)\\ , |0|')e(t+=)l^l'. 
II ^^ ^ nb — \\\ ^ ^ii2(i+i) II V /|i2(i+i)i ly 

Then, H^{9,x) := e~^2+'^)l I Hv{9,x) satisfies the condition (INECh and theoretically the stan- 
dard Robbins-Monro algorithm implemented with H,; a.s. converges and no projection nor trunca- 
tion is needed. Numerically, the solution is not satisfactory because the correcting factor e~^2+^JI I 
goes to zero much too fast as 9 goes to infinity: if at any iteration at the beginning of the procedure 
9n is sent "too far" , then it is frozen instantly. If e is too small it will simply not prevent explosion. 
The tuning of e becomes quite demanding and payoff dependent. This is in complete contradiction 
with our aim of a self- controlled variance reducer. A more robust approach needs to be developed. 
On the other hand this kind of behaviour suggests that we are not in the right asymptotics to 
control ||^y(^n,^n+l)||2- 

Note however that when F is bounded with a compact support, then one can set e = and the 
above approach provides an efficient answer to our problem. 

A general approach: We consider the density 



By (11. 4p . we have 



VV{9) = E 



F^(X 



id) 



Vp(X(^) 



p(XW - 9) 



with X^^' ~ p{x + 9)dx, i.e. X^^' = X — 9. Since p is the Gaussian density, we have J^^^ 
As a consequence, the function Hy defined by 

Hv{9,x) = F^{x - 9){29 - x), 



p(x) - ^- 



provides a representation W{6) = 'E[Hv{9,X)] of the gradient W. As soon as F is bounded, 
this function satisfies the condition (jNECp . Otherwise, we note that thanks to this new change of 
variable the parameter 9 hes now inside the payoff function F and that the exponential term has 
disappeared from the expectation. If we have an a priori control on the function F{x) as \x\ goes 
to infinity, say 

3AgM+, VxgM"', |F(x)| < CFe^l^^l, 

then we can consider the function Hx{0,x) = e~''^'^'HY{9,x) which satisfies 

<c{i + \e\). 

The resulting Robbins-Monro algorithm reads 

e„.+l =en- Jn+ie^^^'"^F\Xn+l - 0„,)(20„ - Xn+l). 

We no longer to tune the correcting factor and one verifies on simulations that it does not suffer 
from freezing in general. In case of a too dissymmetric function F this may still happen but a 
self- controlled variant is proposed in Section 12.31 below to completely get rid of this effect (which 
cannot be compared to an explosion). 

2.3 Translation of the mean: the general strongly unimodal case 

We consider importance sampling by mean translation, namely we set 

VxGM'^, pg{x)=p{x-e), 

for deR'^. 

In this section we assume that 



p is log-concave and lim p{x) = 0, 

|x|— >oo 

so that d'HiD holds. Moreover, we make the following additional assumption on the probability 
density p 



r-, m u.u . I W ^^(x)=0(|x|'^-i) as |x| ^ oo 

3aG [1,2] such that < ^' p ^ ' ^' ' ' ' ' 

I (ii) 36 > 0, \ogp{x) + (5|x|" is convex. 

First we will use (11.21) to differentiate V since 



Proposition 2 Suppose l\7ii\j and {Tia ) are satisfied and the function F satisfies 



V^gM'^, E 



F^iX) 



p{X-i 



< +00 and VC>0, ¥. F'^{X)e 



.CIXI"-! 



Then V is finite and differentiable on M with a gradient given by 
VV{9) = E 



fHx) ,fJ^\, VpiX 



E 



F\x-e) 



p^{x-e) 

p^{X - 9) Vp{X - 20) 



p{X)p{X-29) p{X-26) 



in 



tr\ 



< +00. (2.9) 

(2.10) 
(2.11) 



Proof. The formal differentiation to get (j2.10p from (jl.2p is obvious. So it remains to check the 
domination property for 9 lying inside a compact set. Let xG M'^ and 0G W^. The log-concavity of 
p implies that 

{Vp{x-e),e) 



\ogp{x) < logp(x — 9) + 



p{x — 9) 



so that 



vix) \Vp{x 

< ^ ' ^. < exp ' ^^ 



p{x - 9) 



p{x — 6) 



Using the assumption (Wq^) yields, for every 0G B{0,R), 

p{X) 



F\X) 



p^X-9) 



VpiX -9)< F\X){A\X\--^ + i?)e(^l^-er'^+^)l^l 



< CRF\X)e 



c'\x\° 



--■.g{X)^L\¥) 



To derive the second expression ()2.1ip for the gradient, we proceed as follows: an elementary change 
of variable shows that 



VV{e) = f F^{x) 

JR'i 



p{x) 



E 



F'ix 



F^{X 



p'^{x — 9) 
_, p'^ix - 9) 



p^ix- 29) 

pHx- 



p{x)Vp{x — 9)dx 
Vp{x - 29)dx 



Vp{X - 29) 



p{X)p{X-29) p{X-29) 



Remark. The second change of variable (in (jl.2p ) has been processed to withdraw the parameter 
9 from the possible non smooth function F to make possible the differentiation of V (since p is 
smooth). The second expression ()2.1ip results form a third translation of the variable in order 
to plug back the parameter 9 into the function F which in common applications has a known 
controlled growth rate at infinity. 

This last statement may look strange at a first glance since 9 appears in the "weight" term of 
the expectation that involves the probability density p. However, when X = J\f{0; 1), this term can 
be controlled easily since it reduces to 



p^{x-9) Vp{x-29) 
p{x)p{x-29) p{x-29) 



e^ (29 -x). 



The following lemma shows that, more generally in our strongly unimodal setting, if l\7ii\i and 



{Tia) are satisfied, this "weight" can always be controlled by a deterministic function of 9. 



Lemma 2 // ( TiJ ) holds, then there exists two real constants A, B such that 

pHx-9) \Vp{x-29)\ ^ „„. _, _, 

p{x)p{x-29) p{x-29) - ^ ' ' ' ' ^ 



(2.12) 



Proof. Let / be the convex function defined on M by f{x) = logj3(rE) + 5|rE|°. Then, for every 
x, 9£R'i, 



log 



p'^jx - 9) 
p[x)p{x — 29) 



log 



fix -9) 
f{x)f{x-29) 



+ (5(|2;|" + |x-2^P-2|x 



Note that x — 9= 2(a^ + (a; — 20)). Then, using the log-convexity of/ and the elementary inequality 



(valid if a G (0,2]) yields 



\< + \v\'' <2 



/ 


U + V 


a 

+ 


U — V 


V 


2 




2 



P'^i^ - ^) < g25|0|" 



One concludes by the point (i) of {Ha)- 



p{x)p{x — 29) 




(2.13) 



Remark. Thus the normal distribution satisfies {'Ha) with a = 2 and 5 = 1/2. Moreover, note 



that the last inequality in the above proof holds as an equality. 

Now we are in position to derive an unconstraint (extended) Robbins-Monro algorithm to 
minimize the function V , provided the function F satisfies a sub-multiplicative control property, in 
which c > is a real parameter and F a function from M'^ to M+, such that, namely 



Vx, yG 



\F{x)\<F{x) and F{x + y) < C{1 + F{x)Y{l + F{y)Y 



E 



|X|2('^-i)F(X)4^ 



{nt) 



< +CX). 



Remark. Assumption (TiJ) seems almost non-parametric. However, its field of application is 



somewhat limited by (W^ ) for the following reason: if there exists a positive real number rj > 



such that X i— > logp{x) + r]\x\'^ is concave, then p{x) < Ce ^'^' {\x\ + 1) for some real constant 



C > 0; which in turn implies that the function F in {TC^J ] needs to satisfy F[x) < C'e^'^' for some 



b G (0, a) and some A > 0. (Then c = Cb with c;, = 1 if 6 G [0, 1] and ci, = 22 if 6 G (1, a), when 
a> 1). 



Theorem 2 Suppose X and F satisfy d'HiD , CH^a), 112. 9\) and {TC^J) for some parameters a G 
(0,2], 6g (0, a) and A > 0, and that the step sequence (7n)n>i satisfies the usual decreasing step 
assumption 

y^ 7n = +00 and ^ 7^_,_i < -|-oo. 



n>l 



n>l 



Then the recursive procedure defined by 

dn+l = On — "yn+lH{9n, Xn+l), Oq^M. 

where {Xn)n>i is an i.i.d. sequence with the same distribution as X and 

H(9 x) ■= ^'(j^-^) c-^^l^l- P'(^-^) yp{x-20) 
l + i?(_6i)2c p{x)p{x-29) p{x-29) ' 

a.s. converges toward an KigvainV -valued (square integrable) random variable 9*. 

Proof. In order to apply Theorem [H we have to check the following fact: 

- Mean reversion: The mean function of the procedure defined by (|2.14p reads 



(2.14) 



(2.15) 



h{9) =E[H{9,X)] 



g-25|6l|'' 

l + F{-9) 



2c 



VVi9) 
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so that r* := {/i = 0} = {W = 0} and if 6* £ T* and 0e M'^ \ T*, 



{d-9*,h{e)) 



1 + F(- 



\2c 



{\/v{e),e-e*) >o 



for every 6^9*. 

- Linear growth of 6 h^ \\H(9,X)\\2: All our efforts in the design of the procedure are motivated 
by this Assumption (jNECp which prevents explosion. This condition is clearly fulfilled by H since 



E[\H{9,X) 



-4<5|e|" 



(l+F(-e)2-)2 



-E 



F^{X - 9) 



pHx 



\Vp{X-29)\ 



p{X)p{X-2e) p{X-29) 



< Ce-^^I^I^E 



(1 + F{XYy{A{\X\''^' + \9\''^') + B) 



where we used Assumption (7i^^' ) in the first line and Inequality (j2.12p from Lemma [2] in the second 



line. One derives that there exists a real constant C > such that 

W.[\H{9,X)\f < C¥.\F{Xf^{l + |X|)2("-i)j (1 + |0|2{«-i); 



This provides the expected condition since [Tv^] holds. ^ 



Examples of distributions 

• The normal distribution. Its density is given on R by 

p{x) = (27r)-f e-l^l'/2, 



x£ 



so that ( |HiD is satisfied as well as (TL^a'^ for a = 2, 5 = ^. Assumption CH*^) is satisfied iff 
{h, A) G (0, 2) X (0, oo) U {2} x (0, i). Then the function H has a particularly simple form 

H{9, x) = e-^l^l V2(x - 9){2e - x) 

The hyper- exponential distributions 

p{x) = Cd,a,ae~^P{x), oG [1, 2] 

where P is polynomial function. This wide family includes the normal distributions, the 
Laplace distribution, the symmetric gamma distributions, etc. 

The logistic distribution Its density on the real line is given by 



\Hi\ is satisfied as well as (TVa) for a = 1 + r/ (r^E (0, 1)), 5 > 0. Assumption {Ti.^) is satisfied 



iff (6, A) G (0, 1) X (0, CX3) U {1} X (0, 1). 
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2.4 Exponential change of measure: the Esscher transform 

A second classical approach is to consider an exponential change of measure (or Esscher transform). 
This transformation has already been consider for that purpose in [12] to extend the procedure with 
repeated projections introduced in [2]. We denote by ip the cumulant generating function (or log- 
Laplace) of X i.e. ip{6) = logE[e^^'"*''^] . We assume that ■0(^) < +oo for every 6 ^ W^ (which 
implies that V' is an infinitely differentiable convex function) and define 



pe{x) 



'),x)—il' 



''%(-), 



X G 



Let X^ ' denote any random variable with distribution pQ. 
We assume that ■0 satisfies 

lim Me) - l-ipi-] =+oo and 3 5 > 0, 6* ^^ ^(6*) - 516*^ is concave. 

|6l|^+oo V2/ 



m 



One must be aware that what follows makes sense as a variance reduction procedure only if the 
distribution of X^^' can be simulated at the same cost as X or at least at a reasonable cost i.e. 



(2.16) 



where A" is a Borel subset of a metric space and g : M*^ x A" is an explicit Borel function. By (jl.2 
the potential V to be minimized is V{e) = E[F^{X)e~^^'^^+'f''^^^] . 



Proposition 3 Suppose ip satisfies {Ti'^'^) and F satisfies 

ye G R'^, E 



< +00. 



X|F2(X)e<^'^> 

Then (TCi) is fulfilled and the function V is differentiable on M with a gradient given by 
VV{9) = e[(VV'(^) - X) i?2(^)g-(e,x>+vKe)" 
{yi:{6)-X^-'y)F\X^~'~^) 



E 



^^(e)-^(-e) 



(2.17) 

(2.18) 
(2.19) 



where S/^{0) 



E[Xe('' 



^)1 



E[( 



.{e,x)] 



Proof. The function tp is clearly log-convex so that 6 h-> peix) is log-concave for every x£ M*^. On 
the other hand, by (IWf^ we have lim '^/f^,^^ = -|-oo for every xG M"^, and (\Hi\) is fulfilled. 



Pejx) 



The formal differentiation to get (j2.18p is obvious and is made rigorous by applying the as- 
sumption on F. The second expression (j2.19p of the gradient uses a third change of variable 



VV{9) 



{V^P{e) - x) F\x)e-^^'''^+'^^''^p{x)dx, 
(Vi>{6) - x) F\x)e^^^^-^^-'^^p_e{x)dx, 



E 



(vV'(0)-x(-^))f2(x(-^)) 



g^(e)-^(-0)_ ^ 



Theorem 3 We assume that ip satisfies (TC^'^) and F satisfies (|2.17|) and 

Vx G M^ \F{x)\ < Ce^l^l and 



E 



iXPe^l-^l 



< -Foo. 
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Then the recursive procedure 



On+1 = t 



jn+iH{dn, X^+i"^), n > 0, 9o€M'' 
where {£,n)n>i is an i.i.d. sequence with the same distribution as ^ in Ii2. 1 b]) and 

H{e,x) := e-^^^'^'^^~^^^F\x){Vi>{6) - x) 

a.s. converges toward an ArgminV -valued (square integrable) random vector 6* . 

Proof. We have to check the Hnear growth of the function 9 h^ ||-f^(^, X^~^)|| (condition ()NEC|) ). 
We have 



E 






< C7e^Av^lv^(-e)| (|vv(0)|2E 



=A|x(-'')| 



+ E 



|^M)|2gA|X(-«)|J^_ (2.20) 



First, by the following inequality 

d 



-Ax, 



y^ g-*'(Ejej(ej.2:>+Ejejc(ej:a;>) 



JC{l,...,d} 



we have e'^'"^' < Y^ gA(ej.a:^> where {ej)j = 1 if j G J or -1 if j G J'^. With this notation, 

JC{l,...,n} 

have 



we 



E 



„A|X(-«)| 



< E E 

JC{l,...,rf} 



,A(e,,,X(-'')>' 



JC{l,...,ci} 



By the concavity oi ip — 6\ . p , we have 

V-u, v£ W^, il){u + v) - V(^) < (V^('"),w) +5|vp 
so that 



E 



=A|X(-»)| 



< y- gA(VV'(-e),ej>+5A2|ej|2 ^Q^^gAVd|VV'(-e)|_ 
JC{l,...,d} 



(2.21) 



In the same way, we have 



E 



\X 



(-e)|2 A|x(-»)| 



,/C{l,...,rf} 

JC{l,...,<i} 



(2.22) 



Now, by differentiation of ij: it is easy to check that 



v^gm^ DV(e) = l^!^^^^;^fM^ - v^(^)«^ 
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which impHes 



E 



|^(Ae,;-e)|2 



Tr:{D'^^P{XeJ - 6)) + Tr(VV'(Aej - 0f 



The assumption {TC^^ ) imphes that < D'^^p(6) < 25 1^ (for the partial order on symmetric matrices 
induced by nonnegative symmetric matrices) then D'^Tp{6) is a bounded function of 0G M and in 
turn V(^) has a hnear growth by the fundamental formula of calculus. Consequently, for every 

Jcii,...,d), 



E 



|j^(Aej-0)|2 



<c{i + \e\^) 



Plugging this into 0:22]) and using (|2:2T]) and (12:201) we obtain E[|iJ(6l,X(~^))|2] < C (l + \e\^). 


3 Adaptive variance reduction for diffusions 

3.1 Framework and preliminaries 

We consider a d-dimensional Ito process X = {Xt)ti=[o,T] solution to the stochastic differential 
equation (SDE) 

dXt = b{t,X')dt + a{t,X')dWt, Xo = xeR'^, (.Et,a,w) 

where W = (M^t)tG[0,Tl is a g-dimensional standard Brownian motion, X* := (XtAs)se[o,T] is the 
stopped process at time i, b : [0, T] x C([0, T], M'') -^ W^ and a : [0, T] x C([0, T],^^) -^ M{d, q) are 
measurable with respect to the canonical predictable cr-field on [0, T] x C([0,T],M'^). For further 
details we refer to [B], p. 124-130. 

Thus, if b{t,x^) = l3{t,x{t)) and a{t,x^) = '&{t,x{t)) for every xe C{[0,T],R'^), X is a usual 
diffusion process with drift /? and diffusion coefficient Tl). 

If &(t,x*) =P{t,x{t)) andcj(t,x*) =^t,x{t)) for every xG C([0,r],M^) where t := [f j|, then 
X is the continuous Euler scheme with step T/n of the above diffusion with drift f3 and diffusion 
coefficient t?. 

An easy adaptation of standard proofs for regular SDE's show (see [18]) that strong existence 



and uniqueness of solutions for ( i?;, ^ ly ) follows from the following assumption 



{{i) fe(.,0) and <7(.,0) are continuous 
(ii) VtG [0,r], Vx, yGC([0,r],M'^), \b{t,y) - b{t,x)\ + \\a{t,y) - a{t,x)\\ <Cb,a\\x-y\L. 

Our aim is to devise an adaptive variance reduction method inspired from Section [5] for the 
computation of 

E[F(X)] 

where F is an Borel functional defined on C([0,T],M ) such that 

¥[F^{X)>0]>0 and F{X)e L^{¥). (3.23) 

In this functional setting, Girsanov Theorem will play the role of the invariance of Lebesgue measure 
by translation. The translation process that we consider in this section is of the form @{t, X^) where 
9 is defined for every ^G C([0,r],M'^) and ^G L^p by 

e{t,O-=^{t,e)0t, with ip:[0,T]xC{[0,T],R'')^M{q,p), 
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a bounded Borel function and 0G -^Tp (represented by a Borel function) for p > I. In the sequel, 
we use the foUowing notations (ptiO '■— '/'(^j?*); 

Gi := e{t, X'), ef := e(t, x(^)'*), and bJ"^^ := e(t, x(-^)'*), 

where X*-^^) denotes the solution to (-E'bio-e.o-.H/)- 

First we need the following standard abstract lemma. 

Lemma 3 Suppose {Hb^fj) holds. 

The SDE (-E'fe+o-e.o-,^) satisfies the weak existence and uniqueness assumptions and for every nan 

negative Borel functional G : C([0, T],]R '^^) — > M+ and f € C([0,T],M'') we have, with the above 

notations, 



E 



G(^X,j{fis,X'),dWs 



E 



G[x^'\j{f{s,x^'^n,dw,) + J{f,e){s,x^'^nds 

/„^(ei''\dH/.>-i||0W||^2 



X e 



T,q 



and 



E 



G[x^'\J{f{s,x('^n,dWs) 



E 



G(X, l{f{s,X'),dWs)- {f,e)is,X')ds 
Jo 



X e 



/„^(e.,dw'.>-i||e||22 



T,<3 



Proof. This is a straightforward application of Theorem 1.11, p. 372 (and the remark that imme- 
diately follows) in [16] once noticed that {t,u!) i— > 6(i,X*(a;)), {t,u;) i— > a{t,X^{u;)) and (t,^) i-^ 
Q(t,X^{uj)) are predictable processes with respect to the completed filtration of W. ^ 

Remarks. • The Doleans exponential I e ^-9 1 is a true martingale for any 

V /te[o,T] 

• In fact, still following the above cited remark form [16], the above lemma holds true if we replace 



G by any progressively measurable process O such that E 



62 



l/o^|0(.,c^)|2d; 



< +00. 



It follows from the first identity in Lemma [3] that for every bounded Borel function ip : [0, T] x 



C([0,T],M'^) -^ M{q,p) and for every 9eL 



E[F{X)] = E 



F{X 



T,p 



(9)^ 



■/„^(ei*\diy,>-|||0W| 



^T,q 



(set G{x,y) = F{x)). So, finding the estimator with the lowest variance amounts to solving the 
minimization problem 



min V{9) where V{e) := E 



F2(xW)e" 



2f^'{e^f\dWs)-\\em\ 



^T,q 



-2y{T)-\\ip{.,x-)e\\l2 

Using Lemma[3]with G{x,y) = F {x)e '^■■i and f = Q yields 



v{e) = E 



F^iX)e 



-/o^(0«,diy.>+i||e|i;2 



"T,9 



(3.24) 
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Proposition 4 Assume Ei^(X)^+'' < +cxd for some r] > as well as Assum,ptions \3. 23\) and 
(Hh^a-)- Then function V is finite on L'^„ and log-convex. 

(a) Assume that the bounded matrix-valued Borel function (p satisfies that Lp[s,X^) has a non- 
atomic kernel on the event {F{X) > 0} i.e. 

¥[{3ee Llp\{0} s.t. e{s)e Kerip{s,X') ds-a.e. and F^{X) > 0}] = (3.25) 

then for every finite dimensional subspace E <Z Lii,, lim ^(^) = +oo. If further- 



||6»|i 2 -^+oo,ee-B 



more 



inf / e{sY¥.[p{s,xy^{s,x')i{F2^x)>^}\e{s)<is>^, (3.26) 



then lim y{&) = +00. 



o ^+00 



(6) The function V is differentiable at every 6£ L'^p and the differential DV{9) G L'^j, is char- 



acterized on every ijj G L^ by 



(Dy(0),V^), 



E 



'T,p 



F2(X)e 



■ C{esAWs)+\\\ef. / '■^ 



"T,p 



(99(s,X^)V's,dt^.) 







E 



|e(-»)| 



F2(x(-^))e"" "^--(2(e(-^),v.(x(-^)'-)v^^ - / {^{x^-^^^n^sAWs)) 



(3.27) 



Remarks. • For practical implementation, the "finite dimensional" statement is the only result of 
interest since it ensures that Argmini^ 7^ 0. 

• li p = q and ip = Iq, the "infinite-dimensional" assumption is always satisfied. 



Proof, (a) As concerns the function V, we rely on Equality (|3.24|) . Set r = 1 + 2/r]. Owing to the 
Holder Inequality, showing that this function is finite on the whole space L'^ amounts to proving 
that 



E 



I T,q 



< e 



ll^llLl|e|l£2 r(r+l)/2 
T,p 



< +00. 



To show that V goes to infinity at infinity, one proceeds as follows. Using the trivial equality 



'f^{es,dWs) + ^\\e\\l2 / -1 rT/o ^w\_l1|1CiI|2 \2 1||c||2 



"T,q 



I j^{es,dWs)+^je\\\ Y zl!0lir2 



r Z \ 411^117-2 

^T,q e ^T,q 



and the reverse Holder inequality with conjugate exponents (3,-2) ^^ obtain 



V{9) > E 

>E 



liiieil 



T,q 



F'^l'\X)e 



E 



1 rT/, 



iiicii2 n -2 



Ji{@sAWs)-\\ml2 
e T.9 



' 11611,^2 



by the martingale property of the Doleans exponential. Let e > such that P[F^(X) > e] > 0. 
We have then ¥{9) > e^/^E 



i^lieil, 



L{F2(X)>e}f 



V{9) > e^/^E 

= E 



and by the conditional Jensen inequality 



ii4ii®ii4 k'(^)>e] 



^r[FHX)>e]E[\\e\\% l{^2(^)>,j]^3 



{F2(X)>e}' 



^T,q 
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Now 



E 



|0|^^ MFHX)>e} 



9{srE[ips{xrMx'n{F^ix)>e}] d{s)ds > o. 



'0 

The assumption (j3.25p implies that, for every 0G -^^p, 

9{srE[Mxr Mxn'i-{F%x)>e}] ois)d.s > f 0{srE[Mxr MxnMFHx)>o}] ^(*)d^ > O' 



so that if 6 runs over the compact sphere of a finite dimensional subspace E of L^ ^ 



inf 



II 2 =lfi&EjQ 



e{srE[ips{X'r iPs{X^)l{F2^X)>e}\ 0{s)ds > 0, 



SO that 



lim E 

T,p 



Ie||f2 1; 



UL^^ ^{FHX)>e} 



+ 00, 



and one concludes by Fatou's Lemma using that P[F^(X) > e] > 0. The second claim easily 
follows from Assumption (|3.26p . 

(6) As a first step, we show that the random functional ^{9) := 2 II0IL2 — /q {Q{s),dWs) from 



T,q 



LTp into L''(P) (rG [l,oo)), is differentiable. Indeed, it from the below inequality. 



ve, ^e Ll^^, M9 + V) - H9) - (D#(0), v^U I < Mt \\9\U 



T,p' 



T,p 



(3.28) 



where ^ ^ (D<5(^),V)^2 = f^' {ips{X')9{s),^s{X')i^{s)) ds - f^' {ips{X')tP{s),dWs) is clearly 



T,p 



bounded random functional from Lj.„ into L'^'(P), with an operator norm |||D<I>(0)|||^2 Lri]^\ < 



T,p' 



llv'lloo II^IL^ + Cp llv?!!^ (cpG (0, +00) (this follows from Holder and B.D.G. inequalities). 

Then, we derive that 9 1— > e**-^^ is differentiable form L"^ „ into every L''(P) with differential 
e*(^) D<l>(0). This follows from standard computation based on (j3.28p . the elementary inequality 



|e" - 1 - n| < ^^^(e" + e"") and the fact that 



T 



{iPs{x')i^{s)AWs 



^i[^(4>{x-)e{s)AWs) 



< 



{Mx'ms),dWs 



j;f(vs{x-)e{s),dw,) 



2p 



< Cp IIV^IL ll^ll^l J|</.||^2,^^ , 



where we used both Holder and B.D.G. inequality. 

One concludes that 6 ^ V{9) = E[F(X)2e*(^)] is differentiable by using the (L^ ^, L'"(P))- 

differentiability of e*'^-* with r = 1 + ^. 

The second form of the gradient is obtained by a Girsanov transform using Lemma [3l ^ 



3.2 Design of the algorithm 

In view of a practical implementation of the procedure we are lead to consider some non trivial 
finite dimensional subspaces E of L'^„. The function V being strictly log-convex on E and going 
to infinity as \\9\\^2 goes to infinity, 9£ E, the restriction of y on £^ attains a minimum 0^ which 

T,p 

de facto becomes the target of the procedure. Furthermore, for every 9 £ E, DV\e{9) = T)V{9)\e 



and the quadratic function L(9) :- 



^ ^T.p 



is a Lyapunov function for the problem. 
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Like for the static framework investigated in Section 12.31 our algorithm wih be based on the 
representation (j3.27|) for the differential D^ of V: in this representation the variance reducer 9 
appears inside the functional F which makes easier a control at infinity in order to prevent from 
any early explosion of the procedure. However, to this end we need to control the discrepancy 
between X and X^~^' . This is the purpose of the following Lemma. 

Lemma 4 Assume {Hh^^) holds. Let ip be a bounded Borel A4{q,p) -valued function defined on 
[0,T] X C([0, r],R'^), let 6 G L'^r, '^'^'^ ^^^ -^ ^''^d X^^' denote a strong solutions of Ehf^yy and 
Eb+a0,a,w driven by the same Brownian motion. Then, for every r > 1, there exists a real constant 
Cfjo- > such that 



sup \Xt — X^ 

tG[0,T] 



Wi 



< Cb„e^''-^ 



|a(s,X(^)'")ef)|ds 



(3.29) 



Proof. The proof follows the lines of the proof of the strong rate of convergence of the Euler 
scheme (see e.^. [3]). ^ 

The main result of this section is the following theorem. 



Theorem 4 Suppose that Assumption /i3. 23\) and (Hf,^^) hold. 

Let if be a bounded Borel M{q^p) -valued function (with p > 1) defined on [0,T] x C([0,T],R"') 
and let F be a functional F satisfying 



yx£C{[o,T], 



\F{x)\ <C^(l + ||x| 



(G 



F,X) 



for some positive exponent A > (then F(X) G L''(P) for every r > 0). Let E be a finite dimensional 
subspace of L^ spanned by an orthonormal basis (ei, . . . ,6^)- 
Let T] > 0. We define the algorithm by 

where 7 = (7n)n>i satisfies 112. 6\) . {W^"'')n>i is a sequence of independent Brownian motions for 
which X^~^"> = Q{—6n-,W^^~^^>) is a strong solution to (£'(,_o-e)^ ) and for every standard 
Brownian motion W , every J^^^ -adapted MP-valued process ^ = (.^t)te[o,T]; 



{Hx,r,{9,^,W),e. 



i/r.l 



-T,p 



^xA^,OF\Oe 



where for rj > 



* 



\v( 



Then the recursive sequence 
random variable 9*. 



^^^^"'"'^•''■^"^^'M2(e(.,e),^(.,e)e.),, 



T,q 



,0 



ll¥'lloolie|lr2 

T,p 

||2A+77 



i+llv(.,C-) 



-i\M\o.+VW 



if a is bounded, 



> if a is unbounded. 



{^is,^')ei{s),dWs 



7n n>l 



a.s. converges toward an AigiamV -valued (squared integrable) 



Remark. For a practical implementation of this algorithm, we must have for all Brownian motions 
^(n+i) ^ strong solution X^^^"-' of (£'f,_o-e) W^'""*"^')- In particular, this is the case if the driver ip 
is locally Lipshitz (in space) or if X is the continuous Euler scheme of a diffusion with step T/n 
(using the driver ip{t,x*) = f{t,x{t))). 

Note that if (p is continuous (in space) but not necessarily locally Lipshitz, the Euler scheme 
converges in law to the solution of the SDE. 
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Proof. When the diffusion coefficient a is bounded, it follows from Lemma S] that, for every r > 1, 



sup \Xt - XI 

te[o,T] 



{0)\ 



< Cha,T \\^\ 



I? W\\ 



where ||o-||^ = sup(j^^)g[o,T]xC([o,T],R'') l|o"(*>2;)||. 

First note that for every ^, ip^ E, the mean function h of the algorithm reads 



so that, for every 9^9*^, 
{hi9),9- 



{H^J9,X^-<^\W),^l;)^ 



'T,p, 



E 



'IMUMr 



"T,p 



l+||e(-e)||2A+r? 



{DV\Ei9),^P\ 



'T,p 



E 



-\MU\' 



^T,p 



l + ||eM)||2A+r, 

" T,q 



{DV\e{9),9-9%) 



'T,p 



>0. 



It remains to check that for every i G {!,... ,m}, \\Hx,r^i9, X^~^\W)\\^ < C(l + ||6'||^2 ) to 
apply the Robbins-Zygmund Lemma which ensures the a.s. convergence of the procedure (see 
SectionEI]). We first deal with the term F{X^-'^'))^ f^{ips{X(^-'^^'')ei{s),dWs). Let 7?' = ^ > 0. 



JO 



< 



< 



< 



F{X^-')f 

F{X^-e)f 
F{X^-')f 



2+17 



2+77' 



2+v 



{MX^~'^ne^{s),dW,) 



2(1 + 1/V) 



/ \ips{X(-'^'ne^{s)f ds 
Jo 



1+1/r;' 



m 



Now 



F{X 



(-e)x2 



< r /"i + llllx(-^)|| ||2^(^+''') 

2(1+,,/) - V III! lloolkAdW) 



<r n /1-lIIIIvII ||2A{i+»?') , 



\2X+Ti 



< C\,b,a,ip,T I 1 + 



One shows likewise that 



F(X( 



< C 



X,b,a,ip,T 



1 + 



||2A(l+r,') II ||2A(l+r?') n ||2A(l+r,') 

M^d, II r II 00 II llcx) 

T,p 



|2A 



"T,p 



Combining theses estimates shows that Hx^^{9^ X^ ' , W) satisfies the linear growth assumption in 

L2(P). 

If (7 is unbounded it follows from Assumption (Tib^a) that, for every (i,x)E [0,r] x C([0, T],]R'^), 



\\a{t,x)\\<C^{l + \\x\\J 
Elementary computation based on (j3.29p and Lemma [3] yield 

r-T 



O" 



(.,xW'^)e(^)| 



<c^-ll^lli II^IL 1 + 



T,p 



\X\\ e 



\w\\ 



■illfll4//o'(^.d^=) 



<C.||e|U IIV'IL l + e 



2rr' II^I'l2 






"T,p 



llr(l + r') 



<a 



r,b,a,ip 



-^T,p 
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for every r > (Assumption {TCb,a'l implies that ||||-'^||;^|| < +00 for every r > 0). Following the 



-(lklloo+l)l|8IL2 
e T.p 



same proof to the bounded case, we obtain easily the results with ^x,r]{0,S,) — \flii2^+i • 

T,q 

We conclude by noting that 77 is an arbitrary parameter to cancel the denominator. ^ 

Remark. If the functional F is bounded (A = 0), we prove in the same way that the algorithm 
without correction, i.e. build with ^a,?? = Ij a-s. converges. 

4 Additional remarks 

For the sake of simplicity we focus in this section on importance sampling by mean translation in a 
finite dimensional setting (Section 12. 3p although most of the comments below can also be applied 
at least in the path-dependent diffusions setting. 

4.1 Purely adaptive approach 

As proved by Arouna (see [2]), we can consider a purely adaptive approach to reduce the vari- 
ance. It consists to perform the Robbins-Monro algorithm simultaneously with the Monte Carlo 
approximation. More precisely, estimate ]E[F(X)] by 

where X/. is the same innovation as that used in the Robbins-Monro procedure 9^ = O^^i — 
jkH{Ok-i,Xk). This adaptive Monte Carlo procedure satisfies a Central Limit Theorem with the 
optimal asymptotic variance 

VN{SN-nF{X)])^Af{0,al), whith a^ = V{e*) -E[FiX)f . 

This approach can be extended to the Esscher transform when we use the same innovation ^^ 
(see (|2.16p ) for the Monte Carlo procedure (computing Xj^ ''" = g{9k-i,ik)) and the Robbins- 

Monro algorithm (computing Xj^ '"^ = g{—9k~i,S,k))- Likewise in the functional setting we 
can combine the variance reduction procedure and the Monte Carlo simulations using the same 
Brownian motion. 

In practice, it is not clear that this adaptive Monte Carlo is better than the naive two stage 
procedure: performing first Robbins-Monro with a small number of iterations (to get a rough 
estimate 0*), then performing the Monte Carlo simulations with this optimized parameter. 

4.2 Weak rate of convergence: Central Limit Theorem (CLT) 

As concerns the rate of convergence, once again this a regular stochastic algorithm behaves as 
described in usual Stochastic Approximation Theory textbooks like [13], [5], [8]. So, as soon as 
the optimal variance reducer set is reduced to a single point 9* , the procedure satisfies under quite 
standard assumptions a CLT. We will not enter into technicalities at this stage but only try to 
emphasize the impact of a renormalization factor g{9) like g{9) := e"^' ' or g[9) := — -- — — 
induced by the function F on the "final" rate of convergence of the algorithm toward 9* . We will 
assume that d = 1 and that X = J\f{0; 1) for the sake of simplicity. One can write 

H{9,x)=g{9)Ho{9,x) where Ho{9,x) = F^{x - 9){29 - x) 
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The function Hq corresponds to the case of a bounded function F (then A = 0). Under simple 
integration assumptions, one shows that V is twice differentiable and that 

V"{e) = e^E[F2(X)e-^^ (l + (^ - Xf 
Consequently the mean functions h and ho related to H and Hq which read respectively 
h{6) = g(^)e"l^lV'(0) and ho{x) = e-^^^'v'{6) 
are differentiable at 9* and 

h'{e*) = 5(r)e-i^*iV'(r) and /i'o(r) = e-i^*iV'(r) 

Now, general results about CLT say that if jn = g^^ a, f3 > with 

1 _ 1 



then 
where 



V^(en-r)^^'^AA(o;s;) 



a2 



E; = Var(W,Z))^-^^,^-,^. (4.31) 

The mapping a i— > Sq, reaches its minimum at a* = jjjgTx = /g,w , .g^^ leading to the minimal 
asymptotic variance 



s* = s*. 



h'{y*Y h'Q{y*Y E[F2(X)(X2-^*X + 1)]^ 

by homogeneity. 

So the optimal rate of convergence of the procedure is not impacted by the use of the nor- 
malizing function g{9). However, coming back to condition (|4.3U|) . we see that this assumption 
on the coefficient a is more stringent since ,n*\ > 1 (in practice this factor can be rather large). 
Consequently, given the fact that g{9*) is unknown to the user, this will induce a blind choice of 
a biased to higher values. With the well-known consequence in practice that if a is too large the 
^^CLT regime" will take place later than it would with smaller values. One solution to overcome 
this contradiction can be to make a depend on n and slowly decrease. 

As a conclusion, the algorithm never explodes (and converges) even for strongly unbounded 
functions F which is a major asset compared to the version of the algorithm based on repeated 
projections. Nevertheless, the normalizing factor which ensures the non-explosion of the procedure 
may impact the rate of convergence since it has an influence on the tuning of the step sequence 
(which is always more or less "blind" since it depends on the target 9*. In fact, we did not meet 
such difficulty in our numerical experiments reported below. 

One classical way to overcome this problem can be to introduce the empirical mean of the 
algorithm implemented with a slowly decreasing step "a la Rupert & Poliak" (see e.g. [Uj): Set 
'yn = :^, I <r <1 and 

^n+l •= r~; = Gn ~~r(^n — ^n), n > 

n + 1 n + 1 

where {9n)n>o denotes the regular Robbins- Monro algorithm defined by (j2.14p starting at 9q. Then 
(^n)n>o converges toward 9* and satisfies a CLT with the optimal asymptotic variance (|4.3ip . See 
also a variant based on a gliding window developed in 
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4.3 Extension to more general sets of parameters 



In many applications (see below with the Spark spread options with the NIG distribution) the 
natural set of parameters G is not M'' but an open connected subset of M''. Nevertheless, as 
illustrated below, our unconstrained approach still works provided one can proceed a diffeomorphic 
change of parameter by setting 

e = T{e), eee 

where T : W^ — > is a C^-diffeomorphism with a bounded differential (i.e. sup^; |||L'r(0)||| < +cxo). 
As an illustration, let us consider the case where the state function H{9,X) of the procedure 
is designed so that h{6) := K{H{9,X)) = p{6)W{9) where V is the objective function to be 
minimized over Q and p is a bounded positive Borel function. Then, one replaces H(9,X) by 
H{9,X) := DT{9).H(T{9),X) and defines recursively a procedure on M'^ by 

On+l = On — In+lH {9n, Xn+l) ■ 

In order to establish the a.s. convergence of On ■= T{9n) to Argminl/, one relies on a variant of 
Robbins- Monro algorithm, namely a stochastic gradient approach (see [HI [13] for further details) : 
one defines U{9) = V{T{9)) which turns out to be a Lyapunov function for the new algorithm since 

{VU{9),E{DT{9)H{T{9),X))) = p{T{9))\VU{9)\^ > on T~\{VV / 0}). 

If ?7 satisfies ||^(6i,X)||2 + |Vf7(^)| < C{l+U{9))^ (which is a hidden constraint on the choice of T) , 
one shows under the standard "decreasing" assumption on the step sequence that U{9n) — > Uoo G 
Li(P) and Enln+ip{9n)\VU{9n)\^ < +00. If lim V{0) 

one easily derives that dist(6'„, {V^ = 0}) — > a.s. as n — > oo. 



+00 or limini p{T{9))\VV{9)\^ > 0, 



5 Numerical illustrations 



5.1 Multidimensional setting: the NIG distribution 

First we consider a simple case to compare the two algorithms of Section [21 The quantity to 
compute is 



E[F{X)] = [ F{x) pnig(x; a, /3, 6, /x) dx, 

JR 



where pTsucixia, (3,S, p.) is the density of X a normal inverse gaussian (NIG) random variable of 
parameters {a,(3,5,p) i.e. a > 0, |/3| < a, 5 > 0, /u S M, 



pmG{x;a,P,5,p) 



aSKi (ay^(52 + {x - p^ 



J^+|3{x-^J) 



■Kyjd'^ + {x- pf 

where Ki is a modified Bessel function of the second kind and 7 = \fo? — ]fi. 

We can summarize the two algorithms presented in section [2l more precisely the variance 
reduction based on translation of the density (see Subsection 12. 3p and the one based on the Esscher 
transform (see Subsection [23]), by the following simplified (no computation of the variance) pseudo- 
code: 

Translation (see l2.3p Esscher transform (see [22 



for 


n = to M do 








X ~ NIG (alpha 


beta , mu , delta) 






theta = theta 


- l/(n + 1000)*Hl (theta , 


X) 


for 


n = to N do 








X ~ NIG (alpha 


beta , mu , delta) 






mean = mean + 


F(X) * p(X+theta)/p(X) 




[ 









for n = to M do 




X " NIG (alpha 


beta-theta, mu , delta) 


theta = theta 


- l/(n+1000)*H2(theta , X) 


for n = to N do 




X ~ NIG (alpha 


beta+theta , mu , delta) 


mean = mean + 


F(X) * exp(-theta*X) 


mean = mean * exp 


Jpsi (theta) ) 
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Translation case. We consider the function Hi of the Robbins-Monro procedure of the first 
algorithm defined by 

' piX) \p{x-2e)) ' 

where an analytic formulation of the derivative p' is easily obtained using the relation on the 
modified Bessel function K[{x) = ^Ki{x) — K2{x). 

apply. 



The assumption (W*^) is satisfied with a = 1, and our results of Subsection 
Esscher transform. In the Esscher approach we consider the function H2 defined by 

H2{9, X) = e~l^lF2(X) (VV'(0) - X) . 



Note that ip is not well defined for every 6 G 
of the NIG distribution is defined by 



. Indeed, the cumulant generating function 

for every 9 G (— a — /3, a— 13). Moreover, we need ip{—0) to be well defined i.e. 6 £ (— a+/3, a+ 
(3). To take account of these restrictions, we slightly modify the algorithm parametrization 



(see Subsection |13D 6 = T{9) := {P - a) 



y/l+¥ 



and update ^ S M in the Robbins-Monro 



procedure (multiply the function H2{T{9),X) by the derivative T'{9) = 02^3/2 



The payoff F is a Call option of strike K, F{X) = 50(e — K)^. The parameters of the NIG 
random variable X are q = 2, /? = 0.2, 5 = 0.8 and ^ = 0.04. The variance reduction obtained for 
different value of K are summarized in the tabular [TJ The number of iterations in the Robbins- 
Monro variance reduction procedure is M = 100 000 and the number of Monte Carlo iterations is 
N = 1000 000. Note that for each strike, the prices are computed using the same pseudo-random 
number generator initialized with the same seed. 









var. ratio. 




var. ratio 




K 


mean 


crude var 


translation 


{0) 


Esscher 


{0) 


0.6 


42.19 


8538 


5.885 


(0.791) 


56.484 


(1.322) 


0.8 


34.19 


8388 


7.525 


(0.903) 


39.797 


(1.309) 


1.0 


27.66 


8176 


9.218 


(0.982) 


32.183 


(1.294) 


1.2 


22.60 


7930 


10.068 


(1.017) 


29.232 


(1.280) 


1.4 


18.76 


7677 


9.956 


(1.026) 


28.496 


(1.268) 



Table 1: Variance reduction for different strikes (one dimensional NIG example) 



To complete this numerical example. Figure [T] illustrates the densities obtained after the Rob- 
bins-Monro procedure. The deformation provided by the Esscher transform is very impressive in 
this example. We remark that the Esscher transform modifies the parameter (3 which controls the 
asymmetric shape of the NIG distribution. 

Spark spread 

We consider now a exchange option between gas and electricity (called spark spread). We choose 
to model the price of the energy by the exponential of a NIG distribution. A simplified form of 
the payoff is then 



F{X) = 50(e 



X'^ 



ce 



J^gaa 



K). 
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Figure 1: Densities of X (crude), X + 9 (translation) and X^ > (Esscher) in the case K = 1. 



where X''^"'' ~ iV/G(2, 0.2, 0.8,0.04) and X^"" ~ iV/G(1.4, 0.2, 0.2, 0.04) are independent. 

The results obtained for different strikes after 300 000 iterations of the Robbins-Monro procedure 
and 3 000 000 iterations of Monte Carlo, are summarized in the Table [2l 



K 


c 


mean 


crude var 


var. ratio, 
translation 


var. ratio 
Esscher 


0.4 


0.2 


41.021 


8540.6 


5.0118 


25.171 




0.4 


32.719 


8356.9 


5.1338 


27.006 




0.6 


26.337 


8112.2 


4.9752 


28.062 




0.8 


21.556 


7845.3 


4.7569 


29.964 




1 


17.978 


7582 


4.5575 


32.849 


0.6 


0.2 


33.235 


8378.4 


5.2609 


27.455 




0.4 


26.534 


8133.3 


5.0604 


28.669 




0.6 


21.587 


7862.7 


4.8046 


30.649 




0.8 


17.931 


7595.2 


4.5839 


33.656 




1 


15.184 


7344.2 


4.4064 


37.489 


0.8 


0.2 


26.908 


8160.1 


5.1366 


28.876 




0.4 


21.725 


7884.9 


4.844 


31.018 




0.6 


17.955 


7612.5 


4.6031 


34.166 




0.8 


15.156 


7357.3 


4.416 


38.167 




1 


13.027 


7123.9 


4.2685 


42.781 



Table 2: Variance reduction for different strikes (spark spread example). 

5.2 Functional setting: Down &; In Call option 

We consider a process {Xt)t>o solution of the following diffusion 

dXt = h{Xt) dt + a{Xt) dWu Xo = xo G M. 

A Down & In Call option of strike K and barrier L is a Call of strike K which is activated when 
the underlying X moves down and hits the barrier L. The payoff of such a European option is 
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defined by 



Fix) = (Xt - K)+l 



I min Xt< l\' 
lo<t<T ) 



A naive Monte Carlo approach to price this option is to consider an Euler-Maruyama scheme X = 
{Xtj^)i^^iQ^ n} to discretize X and to approximate min Xt by min Xtf,. It is weh known that 

this approximation of the functional payoff is poor. More precisely, the weak order of convergence 
cannot be greater than ^ (see [Tl]). 

A standard approach is to consider the continuous Euler scheme X^ obtained by extrapolation 
of the Brownian between two instants of discretization. More precisely, for every t S [tk,tk+i], 

Xt = XI + h{Xl)(t - tfc) + a{Xl){Wt - mj, XI = xo G M. 

By preconditioning, 



E[F(X)] = E 



N-\ 



{xt-k)A\- np(^t,,xt,^j 



fc=0 



(5.32) 



withp(xfc,a;fc+i) = P 



min XI > L 



(^tfc'^tfe+l) = {xk,Xk+l) 



_te[ifc,tfc+i 
orem and the law of the Brownian bridge (see for example [9] ) , we have 



. Now using the Girsanov The- 



p{Xk,Xk+l) = 1 -P 

'o 



mm Wt < — 7 — r- 

te[o,ti] o-(xfc) 



Wt, 



^k+l ~ ^k 



a{xk) 
if L > mm{xk,Xk+i), 



(5.33) 



2^L-xf,)(L-x^._^_J^) 
1 _ e -^(-fe)(*fe+i-tfc) ^ otherwise. 

In the following simulations we consider an Euler scheme of step t^ = k— with n = 100. 

Deterministic case (trivial driver ip = 1) 

We consider three different basis of L'^{[0, 1],M) 

- a polynomial basis composed of the shifted Legendre polynomials Pn{t) defined by 

1 d" 



Vn> 0,ViG [0, 1], Pn{t) = Pn{2t - 1) where Pn(t) 



2*^71! dt 



-{(t'-ir). (ShLeg) 



the Karhunen-Loeve basis defined by 

Vn> 0,Vt G [0,1], e„(t) = V2 sin ( (n + -) irU 



the Haar basis defined by 



where ip{t) = < 



Vn> 0,V/c = 0,...,2"- l,ViG [0,1], ?/'n,fc(t) = 22^(2^=* - ?i) 

'l iftG[0,i) 
-1 iftG[i,l) 
otherwise 



(KL) 



(Haar) 
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Black&Scholes Model 

First, we consider the classical Black&Scholes model. We set the interest rate r to 4% and the 
volatility a to 70% (which is a high volatility). The strike of the payoff F is set at K = 115 and 
the barrier level at L = 65. A crude Monte Carlo (with Brownian bridge interpolation, see ()5.32p ) 
give a price of 2.596 with a variance of 230 after 500 000 trials. Note that the true price of this 
product is 2.554. 

For different basis, the results of our algorithm are summarized in the table [5^21 In the Robbins- 
Monro procedure, we define the step sequence by 7^ = , ,^„„a and set the number of iterations at 
50 000. 







Basis 


Dim. 


Mean 


CI 95% 


Variance ratio 


Constant 


1 


2.5737 


±0.0230 


3.4710 


ShiftLegendre 


2 

4 

8 


2.5741 
2.5717 

2.5717 


±0.0197 
±0.0193 
±0.0193 


4.7225 


dShLeg) 


4.9478 




4.9494 


Karhunen-Loeve 
(|KT4 


2 

4 
8 


2.5678 
2.5729 
2.5705 


±0.0164 
±0.0160 
±0.0156 


6.8644 
7.1851 
7.5218 


Haar 


2 

4 
8 


2.5657 
2.5671 
2.5663 


±0.0192 
±0.0163 
±0.0155 


4.9710 


(|Haarp 


6.9459 
7.6574 



Table 3: Variance ratio obtained for different basis in the Black&Scholes model {K 
variance of the crude Monte Carlo: 230). 



115, L = 65, 



In figure [2] are depicted the optimal variance reducer when the optimization of V is carried out 
on Em for several values of m (2, 4 and 8) in the different basis mentioned above. 

A local volatility Model 

To emphasize the generic feature of our algorithm we consider the same product in a local volatility 

model (inspired by the CEV model) defined by 

/9 ^t 



dxj = rxt dt ± ax 



x/TT 



AWt 



(5.34) 



with r = 0.04, ct = 7 and /? = 0.5. 

The price of the Down & In Call (strike 115, barrier 65) given by a crude Monte Carlo with 
Brownian interpolation after 500 000 trials is 3.194 and the variance is 206.52. 



Adaptive case (non-trivial driver) 

We experiment now our algorithm with a non-trivial driver ip defined for t 



tk by 



fe-i 



^(i,e* 



Pk 1 - Pfc ) , with Pfc = fl p{itj , 6j+i ) , 



j=0 



where p is defined by (15.33p . Note that pk = P[mintg[o^ij.] £,t> L\^q,. . . ,it^ so that there is no 
extra-computation compared to the Brownian bridge interpolation. 

We set p = 2 and E = (MI[o,t])^ so that the optimal parameter 6t^ = apk ± /3(1 — pk) with 
{a, P) S M?. The results for different strikes and barrier levels are reported in Table [5] for the 
Black&Scholes model and in Table [6] for the local volatility model. The simulation parameters are 
unchanged. 
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Figure 2: Optimal 6 process obtained with different basis by our algorithm using 50 000 trials. 



Basis 


Dim. 


Mean 


CI 95% 


Variance ratio 


Constant 


1 


3.1836 


±0.0251 


2.6297 


ShiftLegendre 


2 

4 
8 


3.1830 
3.1815 
3.1813 


±0.0223 
±0.0215 
±0.0215 


3.3258 


(ShLeg) 


3.5670 




3.5659 


Karhunen-Loeve 
(|Kh]) 


2 

4 
8 


3.1852 
3.1862 
3.1918 


±0.0187 
±0.0183 
±0.0178 


4.7254 
4.9385 
5.2183 


Haar 


2 

4 
8 


3.1834 
3.1871 
3.1864 


±0.0215 
±0.0186 
±0.0177 


3.5699 


(IHaarp 


4.7896 
5.2675 



Table 4: Variance ratio obtained for different basis in the local volatility model (15.341) (K = 115, 
L = 65, variance of the crude Monte Carlo: 206.52). 
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Strike 


Barrier 


Mean 


CI 95% 


Variance ratio (Crude) 


a 


/3 


85 


65 


2.5738 


±0.0115 


13.49 


(16.56) 


-0.1752 


1.6685 




75 


6.0489 


±0.0186 


14.26 


(43.39) 


0.0493 


1.9191 


95 


65 


2.5704 


±0.0110 


14.64 


(15.26) 


0.0524 


1.9987 




75 


6.0492 


±0.0190 


13.67 


(45.25) 


0.1557 


2.0560 




85 


11.5970 


±0.0301 


12.23 


(112.92) 


0.4108 


2.1226 


105 


65 


2.5687 


±0.0122 


12.03 


(18.56) 


0.3888 


2.1423 




75 


6.0548 


±0.0206 


11.66 


(53.08) 


0.3895 


2.1720 




85 


11.5953 


±0.0308 


11.67 


(118.32) 


0.4524 


2.1608 




95 


19.2882 


±0.0348 


17.17 


(151.04) 


0.6619 


1.7910 


115 


65 


2.5706 


±0.0135 


9.75 


(22.90) 


0.5473 


1.8903 




75 


6.0530 


±0.0211 


11.16 


(55.42) 


0.4591 


1.9371 




85 


11.5976 


±0.0297 


12.55 


(109.98) 


0.4807 


2.0008 




95 


19.2958 


±0.0347 


17.21 


(150.67) 


0.7217 


1.6380 



Table 5: Variance reduction for different strikes and barrier levels in the Black&Scholes model. 



Strike 


Barrier 


Mean 


CI 95% 


Variance ratio (Crude) 


a 


/3 


85 


65 


3.1827 


±0.0127 


10.02 


(20.28) 


-0.3057 


1.5522 




75 


6.4115 


±0.0190 


9.96 


(45.03) 


-0.1428 


1.7985 


95 


65 


3.1846 


±0.0124 


10.65 


(19.08) 


-0.1141 


1.9139 




75 


6.4117 


±0.0199 


9.07 


(49.42) 


-0.0029 


1.9814 




85 


11.4478 


±0.0293 


8.03 


(106.99) 


0.1898 


1.8937 


105 


65 


3.1835 


±0.0135 


8.98 


(22.65) 


0.1487 


1.9628 




75 


6.4120 


±0.0209 


8.21 


(54.59) 


0.1493 


2.0060 




85 


11.4458 


±0.0295 


7.88 


(108.94) 


0.2503 


1.8737 




95 


18.6060 


±0.0345 


9.83 


(149.07) 


0.5594 


1.4343 


115 


65 


3.1817 


±0.0148 


7.38 


(27.54) 


0.3062 


1.6884 




75 


6.4112 


±0.0209 


8.18 


(54.79) 


0.1928 


1.8119 




85 


11.4470 


±0.0289 


8.24 


(104.16) 


0.2599 


1.7430 




95 


18.6061 


±0.0346 


9.79 


(149.76) 


0.5755 


1.4313 



Table 6: Variance reduction for different strikes and barrier levels in the local volatility model. 
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6 Appendix: proof of Theorem [1] 

We propose below the proof of the slight extension of the regular Robbins-Monro algorithm when 
{h = 0} is not reduced to a single equilibrium point. The key is still the convergence theorem for 
non negative super-martingales. 

Proof. Set J^„ := a{eo, Zi, . . . , Z„), n > 1. Let 9* e T* . Then 

= \en - r |2 - 2-/n+l{9n " 6*, i7(0„, Z„+i)) + J^+- 

< lOn - r|2 - 27„+i(0„ - e*,h{en)) - 27n+l(en " r , AM„+i) + 7^+1 [//(&„, Z„+i 



*|2 o«, /D z3* utD \\ o«, /n D* A i\/f \ I «,2 I TT / a v m2 



(6.35) 
where 

AM„_|_i = H{9n, Zn+l) — ¥.[H{9n, Zn+l) \ J^n] = H{6n, -Z'n+l) — h{9n), 

is an increment of (local) martingale satisfying lE[|AAf„+i|^] < C(l + IE[|^n — 0*|^]) owing to the 
assumptions on H and Schwarz Inequality which also implies that 

M{0n-e*,H{9n,Zn+l)W<\{^[\9n-9*\^]+^[\H{9n,Zn+^)\'']) < C(l + E [[&„ - r [^j , 
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for an appropriate real constant C. Then, one shows by induction on n from (j6.35p that \9n\ is 
square integrable for every n > and that AMn+i is integrable, hence a true martingale increment. 
Now, one derives from the assumptions (j2.6p and (j6.35p that 

UUa + Cjl) 

is a (non negative) super-martingale with Sq = \9o — 9*\'^ (i L^(¥). This uses the mean-reverting 
assumption (j2.5p . Hence Sn is P-a.s. converging toward an integrable r.v. S^. Consequently, using 
that Efc>n+i Ik -^ 0' one gets 

n-l 

\9n - r|2 + 2Y,lk+i{0k - e\ h{9k)) ^S^ = S^\{[1 + C,7^)G Li(P). (6.36) 

fc=0 n>l 

The super-martingale {Sn) being L^-bounded, one derives likewise that (|^n.— ^*P)n>o is L^-bounded 
since 

n 

\9n-9*\''<J{{l + C^^l)Sn, n>0. 
fc=l 

Now, a series with nonnegative terms which is upper bounded by an {a.s.) converging sequence, 
a.s. converges in ]R_|_ so that 

Y,ln+l{9n-9\h{9n))<+'X P-a.S. 

n>0 

It follows from (j6.36p that, P-a.s., |^„ — 9*\^ "^zl^ ]^^ which is integrable since (|6'„ — 9*\'^)n>o is 
L^ -bounded and consequently a.s. finite. 
Let L > 0. Set 

O^ := {ioe n, Vn > 0, \9n{io) - 9*\ < L} . 

It follows from the a.s. finiteness of L^ that IJl>o ^l ~ ^ '^■^■- Now we consider the compact set 
K^ = T* n-B(0, L). It is separable so there exists an everywhere dense sequence in K^, denoted for 
convenience (0*'*^)fc>i- The above proof shows that P-a.s., for every A: > 1, \9n — 9*'^\'^ -^ L^ < -|-oo 
as n — > oo. Then set 



n' 



iuen^, |e„H-r'Y"=>°"Lt,(a;),A;>l, ^7„(0„„iH - r , M^n-iM) < +oo 



L 

n>l 



which satisfies P(^' ) = P(r2^). Assume co £ 0,' . Up to two successive extractions, there exists a 
subsequence 9^(^n.u) such that 

{G<t>{n,u) - G*,Hd<i>{n,oj){^))) "^^ and 6'^(„,c^)(w) "-^ G^i}^)- 

The function h being continuous {9^{uj) — 9* ,h{9^{uj))) = which implies that 9^{uj)£ {h = 0}. 
Hence 9^{uj)£ K^. Then any limiting value 9'^{uj) of the sequence {9n{uj))n>i will satisfy 



VA: > 1, \9' (lo) - 9*'"] = \9^{lo) - 9*'"] = \ L^ {. 



w 



which in turn implies that 9'^{uj) = 9^{uj) by considering a subsequence 9*''^ — > 9^{uj). So, 9^{ijj) 
is the unique limiting value of the sequence {9n{'^))n>o i-e- 9n{oj) — > 9^{uj) as n ^ oo. The fact 
that the resulting random vector 9^ is square integrable follows from Fatou's Lemma and the 
L^-boundedness of the sequence {9n — 9*)n>i. <} 
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